Hardware acceleration for a software transactional memory system

ABSTRACT

A method and apparatus for accelerating transactional execution. Barriers associated with shared memory lines reference by memory accesses within a transaction are only invoked/executed the first time the shared memory lines are accessed within a transaction. Hardware support, such as a transaction field/transaction bits, are provided to determine if an access is the first access to a shared memory line during a pendancy of a transaction. Additionally, in an aggressive operational mode version numbers representing versions of elements stored in shared memory lines are not stored and validated upon commitment to save on validation cost. Moreover, even in a cautious mode, that stores version numbers to enable validation, validation costs may not be incurred, if eviction of accessed shared memory lines do not occur during execution of the transaction.

FIELD

This invention relates to the field of processor execution and, inparticular, to acceleration of transactional execution.

BACKGROUND

Advances in semi-conductor processing and logic design have permitted anincrease in the amount of logic that may be present on integratedcircuit devices. As a result, computer system configurations haveevolved from a single or multiple integrated circuits in a system tomultiple cores and multiple logical processors present on individualintegrated circuits. An integrated circuit typically comprises a singleprocessor die, where the processor die may include any number of coresor logical processors.

As an example, a single integrated circuit may have one or multiplecores. The term core usually refers to the ability of logic on anintegrated circuit to maintain an independent architecture state, whereeach independent architecture state is associated with at least somededicated execution resources. As another example, a single integratedcircuit or a single core may have multiple logical processors forexecuting multiple software threads, which is also referred to as amulti-threading integrated circuit or a multi-threading core. Multiplelogical processors usually share common data caches, instruction caches,execution units, branch predictors, control logic, bus interfaces, andother processor resources, while maintaining a unique architecture statefor each logical processor.

The ever increasing number of cores and logical processors on integratedcircuits enables more software threads to be executed. However, theincrease in the number of software threads that may be executedsimultaneously has created problems with synchronizing data shared amongthe software threads. One common solution to accessing shared data inmultiple core or multiple logical processor systems comprises the use oflocks to guarantee mutual exclusion across multiple accesses to shareddata. However, the ever increasing ability to execute multiple softwarethreads potentially results in false contention and a serialization ofexecution.

Another solution is using transactional execution to access sharedmemory to execute instructions and operate on data. Often transactionalexecution includes speculatively executing a grouping of a plurality ofmicro-operations, operations, or instructions. During speculativeexecution of a transaction by a processor, core, or thread, the memorylocations read from and written to are tracked to see if anotherprocessor, core, or thread accesses those locations. If another threadinvalidly alters those locations, the transaction is restarted and it isre-executed from the beginning. Transaction execution potentially avoidsdeadlock associated with traditional locking mechanisms, provides errorrecovery, and makes fine-grained synchronization possible.

Previously, transactional execution has been implemented either fully inhardware, which requires complex and expensive logic but is relativelyfast, or software, which is less expensive and more robust but incurssignificant performance overhead in certain situations. For example,software transactional memory is able to efficiently execute nestedtransactions, but a significant amount of execution time and resourcesare wasted due to the instrumentation of memory accesses inside atransaction. The instrumentation is to ensure that different transactionaccesses disjointed memory locations. For example, when a single threadis running, in certain cases, software transactional memory incurs a2-3× performance overhead compared to a traditional lock basedimplementation. In software implemented systems typically the greatestoverhead is found in tracking load accesses to locations and validatinglocations accessed before committing a transaction.

In contrast, in a hardware only transactional memory system, atransaction may be executed faster, as software is not needed to trackeach access; however, transaction size as well as functionality issacrificed, because of the expensive and complex circuitry/logicrequired. Some resent research proposals have focused on forms of hybridtransaction execution where a transaction is first executed in hardwareand, upon failure, executed in software. However, some performancefeatures that are achieved through software still have to incur theoverhead associated with executing the transaction in hardware first,before the advantages are realized.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not intendedto be eliminated by the figures of the accompanying drawings.

FIG. 1 illustrates an embodiment of a processor for providingaccelerated transactional execution.

FIG. 2 illustrates another embodiment of a processor for providingaccelerated transactional execution.

FIG. 3 illustrates an embodiment of an underlying system including amulti-resource microprocessor and a higher layer abstraction of pseudocode for a transaction.

FIG. 4 illustrates an embodiment of method for accelerating execution ofa transaction.

FIG. 5 illustrates another embodiment of method for acceleratingexecution of a transaction.

DETAILED DESCRIPTION

In the following description, numerous specific details are set forthsuch as examples of specific hardware support for transactionalexecution, specific types of local memory in processors, and specifictypes of memory accesses and locations, etc. in order to provide athorough understanding of the present invention. It will be apparent,however, to one skilled in the art that these specific details need notbe employed to practice the present invention. In other instances, wellknown components or methods, such as coding of transactions in software,demarcation of transactions, architectures of multi-core andmulti-threaded processors, and specific operational details ofmicroprocessors, have not been described in detail in order to avoidunnecessarily obscuring the present invention.

The method and apparatus described herein are for accelerating executionof transactions in a processor. However, the methods and apparatus foraccelerating execution of transactions in a processor are not solimited, as they may be implemented on or in association with anyintegrated circuit device or system, such as cell phones, personaldigital assistants, embedded controllers, mobile platforms, desktopplatforms, and server platforms.

Processors

Referring to FIG. 1, an embodiment of processing element 100, which iscapable of transactional execution, is illustrated. Processing element100 may be any element for executing instructions or operating on data.Examples of processing element 100 include a processor, microprocessor,multi-resource host processor, a microprocessor, a processing core, alogical processor, and an embedded processor, a multi-threadedprocessor, and a multi-core processor.

In one of the examples below, reference to a multi-resource processor ismade. Often a resource is referred to as a processor core, logicalprocessor, or threaded processor. Consequently, a multi-resourceprocessor includes a processor with multiple cores, logical processors,threads, or any combination thereof. A core, as used herein, refers toany logic located on an integrated circuit capable to maintain anindependent architecture state, wherein each independently maintainedarchitecture state is associated with at least some dedicated executionresources. In contrast, a logical processor typically refers to anylogic located on an integrated circuit capable to maintain anindependent architecture state, wherein the independently maintainedarchitecture states share access to execution resources. Often, both acore and logical processor are capable to execute a thread. Therefore, amulti-resource processor may also refer to any processor capable ofexecuting multiple threads.

Processor 100 may include any combination of cores or threads, such as amulti-core processor where each core supports execution of multiplesoftware threads. Note that processor 100 is capable of individualexecution within a system or may be combined with other processingelements in a multiple physical processor system. In one embodiment, tosupport speculative execution of transactions, processor 100 is capableof speculative execution. Other potential execution capabilities ofprocessor 100 include in-order execution, out-of-order execution, serialexecution, parallel execution, fixed point execution, floating-pointexecution, or other well-known types of execution. Specific examples ofexecution logic and resources are discussed below in reference to theexecution resources section.

Shared Memory/Cache

Memory 110 is also illustrated in FIG. 1 coupled to execution resources105. Memory 110 includes any storage elements or devices to be accessedby execution resources 105, such as cores, logical processors, orthreads. In one embodiment, memory 110 is a shared memory shared by atleast two processing resources, such as a core, thread, logicalprocessor, or remote agent. Examples of memory 125 include a cache, aplurality of registers, a register file, a static random access memory(SRAM), a plurality of latches, or other storage elements. Note thatprocessor 100 or any processing resources on processor 100 may beaddressing a system memory location, a virtual memory address, aphysical address, or other address when reading from or writing to amemory location with memory 110. Memory 110 will be discussed in moredetail in reference to the exemplary embodiments below.

As a specific illustrative example, assume that memory 110 is a cachememory, such as a trace cache, a first-level cache, a second-levelcache, or a higher-level cache. Cache 110 includes cache lines 111, 112,and 113, which may also be referred to as memory locations within memory110. Cache 110 and lines 111-112 may be organized in any manner, such asa fully associative cache, a set-associative cache, a direct mappedcache, or other known cache organization.

As another example, assume memory 110 is a plurality of registers usedby a processing element or resource as execution space or scratch pad tostore variables, instructions, or data. In this example, memorylocations 111-112 in grouping of registers 110 are registers 111, 112,and 113.

In one embodiment, lines, locations, or words 111-112 in memory 110 arecapable of storing one element. An element refers to any instruction,operand, data operand, variable, or other grouping of logical valuesthat is commonly stored in memory. In an alternative embodiment, memorylines 111-112 are each capable of storing a plurality of elements perline. As an example, cache line 111 stores four elements including aninstruction and two operands. The elements stored in cache line 111 maybe in a packed or compressed state, as well as an uncompressed state.Multiple elements per shared memory line are discussed in more detailbelow in reference to FIG. 2.

Logical Values

As stated above, memory 110, as well as other features and devices inprocessor 100, store and operate on logic values. Often, the use oflogic levels, logic values, or logical values is also referred to as 1'sand 0's, which simply represents binary logic states. For example, a 1refers to a high logic level and 0 refers to a low logic level. Otherrepresentations of values in computer systems have been used, such asdecimal and hexadecimal representation of logical values or binaryvalues. For example, take the decimal number 10, which is represented inbinary values as 1010 and in hexadecimal as the letter A.

In many older systems a high voltage level was represented by a voltage,e.g. 5V, and a low voltage level, e.g. 0V. As another specific example,a high logic level is at 1.2V and a low logic level is at 0.3V. However,a high logic/voltage level may refer to any voltage level above athreshold value, and inversely, a low logic level may refer to anyvoltage level below the threshold value. In addition, there may be morethan two logical levels in a cell, transistor, or waveform. As anexample, a single waveform may represent four different logical valuesat different voltage levels.

Execution Module/Resources

FIG. 1 also illustrates execution resources 105, which are to executetransactions. Execution resources 105 may also refer to hardware, logic,or modules to support transactional execution. As an example, executionresources 105 are to execute a first transaction and a secondtransaction nested in the first transaction. A transaction is nestedwithin another transaction, when either in software or hardware, a begintransaction demarcation for an inner transaction is within a transactiondemarcation for an outer transaction.

Other common modules, logic, and functional units not illustrated inFIG. 1 may also be included, but are not required to be included, inprocessor 100, such as any one or any combination of the following: adata path, an instruction path, a virtual memory address translationunit (a translation buffer), an arithmetic logic unit (ALU), a floatingpoint calculation unit capable of executing a single instruction ormultiple instructions, as well as capable to operate on single ormultiple data operands in serial or in parallel, a register, aninterrupt controller, an advanced programmable interrupt controller(APIC), a pre-fetch unit, a fetch unit, a decode unit, a cache, aninstruction retirement unit, an instruction re-order unit, and any otherlogic that is be used for fetching, executing, or operating oninstructions and/or data.

Transactions

Transactional execution usually includes grouping a plurality ofinstructions or operations into a transaction, atomic section of code,or a critical section of code. In some cases, use of the wordinstruction refers to a macro-instruction which is made up of aplurality of micro-operations. There are commonly two ways to identifytransactions. The first example, includes demarcating the transaction insoftware. Here, some software demarcation is included in code to beidentified during execution. In another embodiment, which may beimplemented in conjunction with the foregoing software demarcation,transactions are grouped by hardware or recognized by instructionsindicating a beginning of a transaction and an end of a transaction.

In a processor, a transaction is either executed speculatively ornon-speculatively. In the second case, a grouping of instructions isexecuted with some form of lock or guaranteed valid access to sharedmemory locations to be accessed. In the alternative, speculativeexecution of a transaction is more common, where a transaction isspeculatively executed and committed upon the end of the transaction. Apendancy of a transaction, as used herein, refers to a transaction thathas begun execution and has not been committed, i.e. pending. Forexample, if a begin transaction instruction is executed for an outertransaction and then another begin transaction instruction is executedfor a nested inner transaction, the inner nested transaction is stillpending until an associated end transaction instruction is executed andthe transaction is committed. Therefore, any accesses at the level ofthe outer transaction are performed during a pendancy of the outertransaction, and the outer transaction is still pending until it iscommitted or aborted.

Transactional execution, previously, included two basic steps: (1) checka state of a lock associated with a memory access; and (2) validatememory locations accessed before committing the transaction. In fact ina purely software transactional execution environment a softwaretransactional memory (STM) maintains an array of locks, which everymemory location is mapped to through some association, such as a hashingfunction. Usually, upon an access in a transaction, the STM checks alock, remembers values associated with the location to be accessed andthe lock, and before committing the transaction validates whether thelock has been acquired by another transaction during execution. Moreinformation on a purely software implemented STM may be found in“McRT-STM: A High Performance Software Transactional Memory System for aMulti-core Runtime,” by Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L.Hudson, Chi Cao Minh, and Ben Hertzberg, presented at the Proceedings ofPrinciples and Practice of Parallel Programming (PPoPP) 2006.

Acceleration Module

Referring still to FIG. 1, acceleration module 120 is shown coupled tomemory 110. However, a module may be implemented in hardware, software,firmware, or any combination thereof. Furthermore, module boundariescommonly vary and functions are implemented together, as well asseparately in different embodiments. As an example, which is discussedin more detail below, acceleration module 120 re-vectors execution in atransaction to a software based barrier in lock module 115, such as alock associated with a line in memory 110, based on a transaction bitassociated with the line of memory 110 to be accessed in thetransaction. Additionally, acceleration module 120 may include logic togenerate an interrupt if a line in memory 110, which is accessed duringa transaction, is evicted before commitment, and a handler executing onexecution resources 105 to handle the interrupt and abort a transaction.

In another example, acceleration module 120 includes logic to set astate, such as a carry flag, based on a transaction bit associated withmemory line 110, a software application to inspect the carry flag anddecide whether or not to invoke a barrier, and a counter to keep countof a number of lines that were accessed and evicted inside atransaction.

From these examples, it is readily apparent that acceleration module 120may include hardware, such as a transaction bit, software, such as anarray of locks maintained in a memory, or firmware, as well as varyacross boundaries, such as including lock module 115, a transaction bitwhich is present in memory 110, logic in processor 100 to generate aneviction interrupt or set a carry flag, a counter to keep track of thenumber of memory lines accessed inside a transaction, and a handlerexecuted on execution resources 105.

In one embodiment, acceleration module 120 is to determine if an accessto the shared line is the first access to the shared line duringexecution of the transaction. Determining if an access to a line, suchas line 112 in memory 110, is a first access to line 112 duringexecution of a transaction may done by any method of tracking accessesto a line of memory.

Transaction Field/Bit

In one, embodiment each line of memory 110 is associated with atransaction field. Turning quickly to FIG. 2, transaction field 220 isillustrated as part of acceleration module 225. Transaction field 220 isassociated with shared memory line 211 in memory 210 and is to representwhether memory line 211 has been previously accessed during execution ofa transaction or is being accessed for the first time.

Transaction field 220 may be implemented in hardware, software, orfirmware. In one embodiment, transaction field 220 includes a bit or aplurality of bits within shared memory line 211. In the exampleillustrated in FIG. 2, transaction field 220 are the four mostsignificant bits in a cache line of cache 210. However, a transactionfield is not so limited, as it may be implemented in a register, localmemory, or other storage device on processor 200 and associated withcache line 211 through a mapping of bits 221 to line 211.

Assuming shared memory 210 is a cache, each cache line, such as line211, is capable of storing a plurality of elements. In the exampleshown, cache 211 is capable of storing for elements, 211 a, 211 b, 211c, and 211 d. An element as discussed above may include any instruction,operand, grouping of logical values, or any combination thereof. As aspecific example, cache line 211 includes an instruction stored in 211 arequiring the addition of two data operands stored in 211 b and 211 c,and the result is to be stored in 211 d.

Here, transaction field 220 includes four transaction bits, which areshown as transaction bits 221. Each one of transaction bits 221correspond to an element in line 211, as illustrated with the dashedlines from transaction bits 221 to elements 211 a-211 b. As aconsequence, it is able to be determined whether an access to a singleelement in line 211 is being accessed for the first time or has beenpreviously accessed during the transaction. However, any configurationor number of bits or values in access field 220 may represent a first orsubsequent access to any one, combination, or all of the elements inline 211. As an example, when two bits are used, the four combinationsof 00, 01, 11, and 10 are used to reference each element.

To illustrate how an access field, such as access field 220, may operatewithout complicating and obscuring the discussion refer back FIG. 1,where operation of a single transaction bit, not specifically shown,will be discussed. It is readily apparent that an access field with aplurality of bits and shared memory lines with a plurality of elementsmay operate in a very similar manner as in the example below. Therefore,assume there is a transaction bit associated with each line in memory110. Association of a transaction bit with line 111 includes thetransaction bit being a part of line 111 or elsewhere in processor 100and mapped to line 111.

By default the transaction bit is set to a first value, such as a firstlogical value. In this default state, the transaction bit representsthat cache line 111 has not been accessed during execution of atransaction, i.e. during a pendancy of a transaction. Upon an access tocache line 111, which includes a write, store, read, or load to cacheline 111 or a system memory location associated with cache line 111, thetransaction bit is set to a second value, such as a second logicalvalue. In one embodiment, the first value is a high logical value andthe second value is a low logical value. Alternatively, the first valueis a low logical value and the second value is a high logical value.Analogously, in an access field with a plurality of transaction bits,each bit may be set or cleared to represent whether an element in ashared memory line has been accessed.

Consequently, if the transaction bit associated with line 111 ischecked, and the transaction bit represents a first value, then cacheline 111 has not been accessed during a pendancy of the transaction.Inversely, if the transaction bit represents a second value, then cacheline 111 has been previously accessed during the transaction. Uponcommitment of the transaction, the bits set to the second value arecleared to ensure the values are set to the first value, i.e. thedefault state. In one embodiment, a resource ID, such as a core ID orthread ID, as well as a transaction ID may also be stored or associatedwith the transaction bit to ensure which transaction is accessing cacheline 111 or previously accessed cache line 111. The acceleration oftransactional execution-based on a first or subsequent access isillustrated below through optimization of re-vectoring to barriersbefore accessing locations in shared memory and in validation of thoselocations before committing a transaction.

Barriers/Lock Module

In one embodiment, barriers, locks, meta-data, or instrumentation codeassociated with lines in memory 110 are checked based on whetheraccesses to those lines are the first accesses to the lines orsubsequent accesses, i.e. not the first accesses, to those lines duringexecution of a transaction. In the embodiment of a system section, anexample of re-vectoring to a barrier associated with a line of memory isdiscussed in detail. A barrier includes any method of impeding access toa shared memory line/location, any execution of another section of codenot within a transaction associated with the shared memoryline/location, or access to other data, such as meta-data, associatedwith the shared memory line/location.

As a first example, a barrier includes a physical tri-state or otherhardware blocking mechanism. As another example, a barrier includesupdating a state of a carry flag associated with a shared location. Abarrier may include meta-data as well. Meta-data is any logical valuesor data stored in a location associated with the shared memory location.One example of meta-data is a lock, where the location of a lock storesdata to represent a state of the lock. Independently, the data storedmay not have a specific meaning, but by construct, though eitherhardware or software, the value of the data stored exhibits thefunctionality of a lock. Therefore, the use of meta-data is not limitedto the examples of locks discussed below, but may include any dataaccessed upon a first access to a shared memory line. In one embodiment,a lock module, such as lock module 115, is a barrier. In addition, abarrier may also include bookkeeping associated with validating memorylocations before committing transactions, as discussed below in thecommitment module section.

Lock module 115 is illustrated in processor 100 and coupled to memory110, but it is not so limited. In one example, lock module 115 includesan array of locks. In one embodiment, the array of locks are an array ofsoftware locks stored in a memory, such as a local memory on processor100 or a system memory coupled to processor 100. Here, the use of theterm lock refers to a programming construct to not allow access to aresource, processor, logical processor, core, or thread, based on thelock or a value represented by the lock.

Referring again to FIG. 2, one embodiment of lock module 115 isillustrated where an array of locks, i.e. hash table 215, is stored intransactional memory. Lock 218 is associated with line 212 in memory210, while locks 216 and 217 are associated with elements 211 c and 211d, respectively. Association of locks with shared memory lines,locations, or elements may be done through any mapping or othertechnique for associating two locations. In one embodiment, a lock isassociated with a shared memory line through a hash table, such as hashtable 215. In this case, an array of locks is stored in memory andindexed by some portion of an address referencing the shared memoryline. For example, a first number of lower bits of a virtual or linearaddress are masked off to get a cache line address referencing line 212,and that cache line address is used to index lock 218 within the arrayof locks.

A lock, such as lock 218, may have multiple states. As a specificillustrative example, a software lock, such as lock 218, is in an ownedstate or an un-owned state. For example, when a transaction is to writeto line 212, it acquires lock 218 and writes to line 212. When acquiredby another transaction, lock 218 is in an owned state, and line 212 isnot accessible by other transactions or resources. Common methods ofwaiting, back-offs, parallel execution, and other techniques may be usedif a transaction or resource is not able to acquire a lock because it isan owned state. Any method of representing a state may be used torepresent that lock 218 is owned or un-owned, such as representativevalues, words, or bit patterns. In one embodiment, when lock 218represents a first value, lock 218 and line 212 are owned, and when lock218 represents a second value, lock 218 and line 212 are un-owned.

The following example is to illustrate how a programming constructutilizes lock 218, as a barrier to line 212. When un-owned, lock 218represents an odd version value, such as the number three. Upon atransaction acquiring lock 218, the transaction or a resource writes aneven number, such as the number four, to represent that lock 218 isowned. If the transaction updates line 212, upon releasing the lock, itwrites the next odd version number, i.e. the number five, to lock 218 torepresent that: (1) lock 218 is again un-owned, since it currentlystores an odd number; and (2) line 212 was updated from the time thelast odd version value of three was stored in lock 218.

Previously, a purely software transactional memory system potentiallychecks the state of locks before each access to locations in memory,remembers version values stored in the locks, and performs validation onevery location every time before committing the transaction. However, inone embodiment, acceleration module 225 checks a state of lock 218before an access to line 212, if the access is the first access to line212 during execution of a transaction. As stated above, a first accessto line 212 during execution of a transaction may be represented by atransaction field/bit associated with line 212 representing a firstvalue. Essentially, a locking module, such as locking module 115, isinvoked to check lock 218, if the transaction field/bit associated withline 212 represents the first value.

As an example, re-vectoring to a barrier or checking a lock, if thetransaction field/bit associated with line 212, is initiated by asynchronous or asynchronous event. Upon an access to line 212 within atransaction, a synchronous indication, such as setting a carry flag tobe later inspected, or an asynchronous generation of a signal, such as atransaction miss interrupt, occurs to represent that the transactionfield/bit represents the first value.

In the first situation, where a synchronous mechanism is utilized,another state, such as a carry flag, is set, if the transaction fieldassociated with line 212 represents that this access is the first accessduring execution of the transaction. The access to line 212 thenproceeds as normal, and later, upon checking with carry flag, with anapplication or handler, a determination is made of whether to proceed toacquiring a lock, storing version values, and performing validation.

In the situation where an interrupt is generated, a handler, which maybe executed on processor 200 or some firmware associated with processor200, handles the interrupt by re-vectoring execution to the barrier,such as checking lock 218. Note that a first access to line 212 is notthe only event that may cause a re-vector to a barrier, i.e. checking oflock 218, or generation of an interrupt. For example, if a cache-missoccurs, i.e. the requested line is not present in memory 210 and is tobe fetched from a system memory, then the same “first access” method oflocks/barriers may be invoked/repeated. A first access or cache miss isalso referred to herein as a transaction miss notification.

In contrast to a “first access”, in one embodiment, if the access toline 212 is not the first access to line 212 during execution of thetransaction, but is rather a subsequent access to line 212, then lockingmodule 115 is not invoked and lock 218 is not checked before accessingline 212. Or in the alternative, the synchronous notification through amechanism such as setting a carry flag does not occur. As a result,access to line 212 is allowed, without invoking lock module 115, i.e.checking lock 218, if the transaction field/bit associated with line 212represents a second value. In one embodiment, allowing access to line212 is transparent, as line 212 is simply updated by or provided toexecution resources 205 without checking lock 218. As can be seen inthis embodiment, transactional execution is potentially accelerated,where multiple accesses to the same line in a shared memory occur withina single transaction, as subsequent accesses within the transaction tothe same line need not encounter a barrier associated with the line.

Acceleration module 225 and other modules, such as lock module 115 andeviction tracking module 125 shown in FIG. 1, as well as a commitmentmodule may operate in a plurality of modes, such as a first aggressivemode and a second cautious mode.

In one embodiment, in an aggressive mode, a lock module or accelerationmodule 225 acquires the lock, when it is in an un-owned state, and doesnot store a version value stored in the lock. As stated in an aboveexample, an odd version value represents that the lock is not owned.Previously, the odd version number would be stored in a transactionalmemory set to enable validation upon commitment. However, in thisaggressive mode, the second version value is not stored in a localtransaction memory set, after checking the state of the lock anddetermining that the lock is un-owned. Therefore, in one aggressive modeembodiment, instead of doing complex validation before committing atransaction using complex comparison of version values, the transactionis committed, if an eviction notification (synchronous or asynchronous)is not received during a pendancy of the transaction. Eviction andeviction notifications will be discussed in more detail in the evictionmodule section.

As a first illustrative example, a load instruction in a transaction isexecuted to access element 211 d in line 211. If access field 221represents that the load instruction is not the first access to element211 d during execution of the transaction, then element 211 d isaccessed without checking lock 216. However, if access field 221represents that the load instruction is the first access to line 211during execution of the transaction, then a transaction miss interruptis generated or a carry flag is set. A handler handles the transactionmiss interrupt by checking lock 216 or an application inspect the carryflag and calls the handler. If an odd version value is stored in lock216, then the transaction may acquire lock 216 by writing an even valueto lock 216. When in an aggressive mode, the odd version value, whichwas stored in lock 216 before writing the even value to lock 216, is notstored. As discussed later, upon committing the transaction, the versionvalues are not compared, saving the validation cost associated withexecuting the transaction.

Alternatively, acceleration module 225 and other modules may operate ina cautious mode. For example, an aggressive mode is the default mode ofoperation, and after aborting or failing a transaction a predeterminednumber of times, operation is switched to the cautious mode. Note that acautious mode may instead be the default mode of operation, andoperation switches to an aggressive mode, after a predetermined numberof successful transactions without contention or eviction notifications.In the cautious mode, upon a transaction miss notification, the samefunctions of the aggressive mode occur, except the version number storedin lock 216 is stored in a local transaction memory set to enablevalidation upon committing the transaction. Here, if there is noeviction notification during execution of the transaction then thetransaction is committed. However, unlike the aggressive mode, if aneviction notification occurs during the pendancy of the transaction,then the version numbers are stored in the location transaction memoryset to validate the transaction, instead of just aborting thetransaction.

Continuing the example from above, if the load instruction occurs duringoperation in a cautious mode, lock 216 is acquired and a version valuestored in lock 216 is stored in a local transaction memory read set. Ifan eviction notification occurs during the pendancy of the transaction,then the version value stored in the local transaction memory read setis validated against a current version value stored in lock 218. If theversion value validation is successful, the transaction is committed,and if it is not successful the transaction is aborted.

In one embodiment, stores operate in the same manner in both anaggressive mode and a cautious mode. Here, upon a miss notification, anold value of element 211 d is stored/logged in transactional memory andlock 216 is acquired by writing an even number to it. Note thatexecution time is also potentially reduced in that even stores arechecked upon the first access to a line, and not during subsequentstores to the same line.

Eviction Tracking/Commitment Module

As stated above, in an aggressive mode, an eviction notification duringexecution of a transaction may result in aborting the transaction, whilean eviction notification may initiate validation during cautious modeoperation. In one embodiment an eviction notification occurs if a linethat has been accessed by a transaction, i.e. the transaction accessfield represents a second value, is evicted. Eviction of a line includeseviction of a shared memory line by a remote resource, a snoop to theshared memory line by a remote resource, an access to the shared memoryline invalidating a copy of the shared memory line stored in atransaction memory set associated with the transaction, and/or aneviction due to capacity constraints. Therefore, an access by anothertransaction evicting the shared memory line, an access by a remoteresource, such as another core/logical processor present on processor200, any other invalidating access, or capacity constraints results inan eviction notification.

Tracking module 125 shown in FIG. 1, is to track eviction notification.In one embodiment, upon an eviction notification event, an evictionnotification interrupt is generated, which causes execution to bere-vectored to a handler. Logic or other interrupt generating componentsmay be used to generate the eviction notification interrupt upondetecting an eviction notification event. In another embodiment, a countis maintained of the number of shared memory lines evicted, which hadtheir transaction bits set. An application or handler may later inspectthis count and decide whether to re-vector to barriers, such asperforming validation based on the count. Here, an eviction notificationis generated based on the inspection/query of the number of linesevicted during execution, which may be stored in logic, such as acounter. As stated above, the handler may immediately abort thetransaction, abort the transaction at the end of the transaction beforecommitment, perform validation, and/or commit the transaction.

In one embodiment, eviction tracking module is to abort a transaction,if the eviction tracking module is operating in an aggressive mode andan invalid access, i.e. an eviction notification event, to the sharedmemory line occurs during execution of the transaction. In contrast, ifthe eviction tracking module is operating in a cautious mode then theeviction module or commitment module validates the transaction.

A commitment module is to commit the transaction. The boundaries of acommitment module, eviction tracking module, and other modules overlapand include some of the same components. For example, upon commitment,the transaction fields/bits set during execution of a transaction arereset to a first value to assure the next transaction starts from adefault state of transaction bits. Similar operation may be done duringabort of a transaction before re-execution.

Operation of eviction of lines, commitment, and validation are notdiscussed in detail to avoid obscuring the invention, as they arewell-known techniques within transactional execution. As stated above,validation, in one embodiment, includes validating a local copy of aversion number with a current version number stored in a lock.Additionally, the operation of handlers generally and the execution ofhandler routines on processors with execution resources are notdiscussed in detail to avoid obscuring the invention. However, in oneembodiment, a handler to abort a transaction is capable of rolling backnested transactions at a granularity of one transaction at a time.Execution and rolling-back of nested transactions is discussed in aco-pending application with Ser. No. 11/323,092 entitled, “SoftwareAssisted Nested Hardware Transactions.”

An Embodiment of a System

Turning to FIG. 3, an embodiment of a system with an abstraction ofpseudo code to illustrate operation of the system is illustrated.Multi-resource processor 300 is coupled to system memory 330. Althoughnot shown, system memory 330 may be coupled to processor 300 throughother components or devices, such as a memory controller hub. Systemmemory includes any memory for storage in a system such as a SRAM, DRAM,double data rate (DDR) RAM, non-volatile (NV) RAM, EDO RAM, or othermemory device. System memory 330 is to store elements, such asinstructions and data operands to be executed by processor 300. In oneembodiment, system memory stores a plurality of instructions that are tobe grouped into transactions.

Pseudo code 350 illustrates a simplified exemplary operation ofprocessor 300 to accelerate transactional execution. Transaction 351,which includes a plurality of instructions or operations, is to beexecuted by resources 305 and 306. Resources 305 are any combination ofthe following: a core, a thread, a logical processor, or other executionresources. Often a transaction, such as transaction 351 is to beexecuted by one resource or identified with one resource of theplurality of resources. One of the accesses within transaction 351 ismemory access 352, which includes an access to line 311 in shared memory310. A memory access includes a write, read, store, or load to/fromshared memory line 310.

Upon executing access 352, transaction bit 326 is checked to determine,if access 352 is a first access to line 311 during execution oftransaction 311. If transaction bit 326 represents a second value, whichis either a high or low logical value depending on the choice in design,then line 311 is accessed without re-vectoring execution to barrier 317,which is associated to line 311 through a hashing function into array ofbarriers 315. Storing the second value in transaction bit 326 representsthat a previous access to line 311 occurred during execution oftransaction 351. Therefore, barrier 317 is not re-checked.

In contrast, if transaction bit 326 is a second value to representaccess 352 being the first access to line 311 during execution of thetransaction or a cache-miss to line 311 occurs, then execution isre-vectored to barrier 317. In one embodiment, execution is re-vectoredby generating a user-level interrupt based on transaction bit 326representing the second value. In another example, a carry flagassociated with transaction bit 326 or with line 311 is set, which isthen inspected by an application to decide whether to re-vectorexecution to a handler. Assuming barrier 317 includes a lock in an arrayof locks maintained in a memory, then lock 317 is checked to determineif it is an owned or un-owned state. Here, an owned state is representedby an even value and transaction 351 is not able to acquire a lock toline 311 through lock 317. In contrast, an un-owned state is representedby an odd version number, such as a binary representation of a decimalnumber nine.

If acceleration module, which includes transaction bits and an executingsoftware transactional memory handler, is operating in an aggressivemode, then lock 317 is acquired through writing an even number to lock317. Yet, the version number in lock 317 is not remembered/stored.However, if in a cautious mode, the version number is stored in a localtransaction memory set, such as a read set for a load operation. Next,in both operational modes execution flow is returned to perform thememory access 352 to line 311.

If during execution an eviction notification is received, thentransaction 351 is either aborted at that time or upon commitment. Here,the portion of pseudo code 350 is shown under a commit transactionsection; however, an eviction notification may be generated and receivedin the middle of execution of a transaction before commitment and thetransaction may be aborted at that time. For example, if line 311 issnooped by, evicted by, or invalidly accessed by resource 306, which isnot tasked with executing transaction 351, then an eviction notificationinterrupt is generated. A handler receiving the interrupt may abort thetransaction at that time or wait until an attempt to commit transaction351 before handling the interrupt. Alternatively, a counter is used tokeep track of the number of lines evicted that had their associatedtransaction bit set. Anytime during the pendancy of the transaction, thecounter may be queried and aborted based on the value of the counter.

In an asynchronous interrupt case, if in an aggressive mode and aneviction interrupt is received, then transaction 351 is aborted andpotentially restarted. The transaction bits set previously bytransaction 351 are reset/cleared and locks obtained are released. Incontrast, if no eviction interrupt is received during aggressive mode,transaction 351 is committed, which potentially saves the execution timeof validating each address accessed during transaction 351. Upon commit,the locks are still released and the transaction bits reset. Ifoperating in a cautious mode and an eviction interrupt is received thenthe read set is validated, which is enabled by the previous stores ofversion numbers in the read set from above. If the validation issuccessful then transaction 351 is committed including releasing thelocks and clearing the transaction bits. If the validation is notsuccessful, transaction 351 is aborted and restarted. Yet, if noeviction interrupt is received, even in cautious mode, validation may bespared and transaction 351 committed without incurring extra validationexecution time.

AN EMBODIMENT OF A METHOD FOR ACCELERATING TRANSACTIONAL EXECUTION

Turning to FIG. 4, an embodiment of a flow diagram for a method ofaccelerating transactional execution is illustrated. In flow 405, amemory access instruction within a transaction is executed. The memoryaccess instruction references a location in the shared memory. Thereference to the location in the shared memory may include a virtual orlinear address referencing the shared memory location itself or someexternal memory location associated with the shared memory location. Asstated above the shared memory includes any shared memory device on aprocessor, such as a cache, register, or other storage element.

Next, in flow 415 a value of a transaction bit associated with theshared memory location is determined. The transaction bit may beassociated through a mapping to the shared memory location, or it may bea part of the shared memory location. Determining the value of atransaction bit includes any known method of reading/detecting the valueof a storage cell, such as a logic level. For example, upon executingthe access instruction, the value is read from the transaction bit. Ifthe transaction bit associated with the location is a first value thenthe location is accessed without determining a state of a lock mapped tothe location in the shared memory in flow 420. Therefore, the accessoccurs like a normal load, store, read, or write.

However, if the transaction bit associated with the location representsa second value, then a state of the lock is determined in flow 425. Fromabove, the state of the lock is represented by values representing ownedor un-owned states. As an illustrative example, the lock is owned, if aneven value is stored in the lock, and an odd version value if the lockis available, i.e. not owned. Alternatively, another value, such as atransaction ID or resource ID, may be written to the lock to representit is owned. In flow 430, the lock is acquired and the location isaccessed, if the state of the lock represents an un-owned state.Continuing the illustrative example, a lock is acquired by writing aneven number to the lock. Finally, the transaction bit associated withthe location is set to the first value to represent the location hasbeen accessed during execution of the transaction a first time already.

Note, that the flow is illustrated in a linear fashion; however, anyflow may occur in a different order as shown. For example, immediatelyafter determining the value of the transaction bit represent the secondvalue, the transaction bit may be set to the first value to representthat it has been accessed a first time.

In one embodiment, if operating in an aggressive mode, the version valuestored in the lock to represent an un-owned state is not stored beforeaccessing the location. Alternatively, in a cautious mode, the versionvalue stored in the lock to represent an un-owned state is stored in alocal transaction memory set before accessing the location.

In flow 440, which may also occur at any time during the flow shown inFIG. 4, an eviction interrupt is generated, if the location is evictedand the transaction bit associated with the location represents thefirst value. Here, if the location has been accessed during execution ofthe transaction, as represented by the transaction bit storing thesecond value, and is evicted, then an eviction interrupt is generated.This interrupt may be generated at the time of the eviction and handledat that time or later. Alternatively, in flow 440 a counter isincremented to keep track of a number of line evicted during executionof the transaction. The counter may be later examined or queried todecide storing of versions, validation, and commitment, as discussedbelow.

In an aggressive mode and in the cautious mode, the transaction iscommitted if no eviction occurs during execution of the transaction.Additionally, in the aggressive mode, the transaction is aborted if aneviction does occur during execution of the transaction. However, in thecautious mode the transaction is validated before committing thetransaction, if an eviction interrupt is generated. If the validation issuccessful then the transaction is committed, otherwise, it is abortedand restarted. Note either the aggressive mode or the cautious mode isthe default mode, as discussed above.

ANOTHER EMBODIMENT OF A METHOD FOR ACCELERATING TRANSACTIONAL EXECUTION

Referring lastly to FIG. 5, another embodiment of a flow diagram for amethod of accelerating transactional execution is illustrated. In flow505, it is determined if an access referencing a location in a sharedmemory is a first access to the location during a pendancy of atransaction. In one embodiment, the location is a shared memory line. Inanother embodiment, a shared memory line is capable of storing aplurality of elements, such as instructions, operands, data operands,logical values, and any combination thereof, and an element in theshared memory line is the location.

The location is associated with a transaction field, which has at leasta transaction bit. Similar to the operation above, the transactionbits/transaction field are/is used to determine if the access is a firstaccess to the shared memory line or element within the shared memoryline during a pendancy of the transaction.

In flow 510, if it is the first access to the shared memory line or theelement in the shared memory line, then execution is revectored to abarrier associated with a location in the shared memory. The barrierincludes any locking or access mechanism associated with a shared memorysuch as a cache. In one embodiment, the barrier includes a lock withinan array of locks maintained in software. The operation of locks andbarriers are similar to the operation of locks and barriers discussedabove in reference to FIGS. 1-3. For example, re-vectoring executionincludes executing a handler to handle a transaction miss interrupt,wherein the execution of the handler is the re-vectoring of execution toa barrier, even without accessing or checking a lock. In thealternative, re-vectoring execution includes checking a state of a carryflag and calling a handler to handle the transaction miss.

Otherwise, in flow 515, if the access is a subsequent access to theshared memory line or the element in the shared memory line, then thelocation in shared memory line is accessed without re-vectoringexecution to the barrier associated with the location in the sharedmemory. Here, an operation such as a load or store operates normallywith out barriers.

As illustrated above, acceleration of transactional execution isaccomplished in a number of ways. For example, a barrier, such as lockwithin an array of software locks, is only accessed upon a first accessto a shared memory location within a transaction. Subsequent accessesmay directly access the location without incurring the execution hit ofaccessing a barrier. Furthermore, different operational modes providedifferent levels of acceleration. In an aggressive mode, version numbersof locks are not stored, so no validation execution delay is incurredupon committing the transaction. In fact, the transaction is justcommitted if no eviction interrupts occur during execution of atransaction. In contrast, in a cautious mode, version numbers are storedto perform validation if necessary. However, just as in the aggressivemode, the execution hit associated with validation is not incurred if noeviction interrupts are generated during execution of the transaction.Therefore, both the accessing of locations and the barriers associatedwith those accesses are accelerated, as well as the potential validationof a transaction before commitment is accelerated.

The embodiments of methods, software, firmware or code set forth abovemay be implemented via instructions or code stored on amachine-accessible or machine readable medium which are executable by aprocessing element. A machine-accessible/readable medium includes anymechanism that provides (i.e., stores and/or transmits) information in aform readable by a machine, such as a computer or electronic system. Forexample, a machine-accessible medium includes random-access memory(RAM), such as static RAM (SRAM) or dynamic RAM (DRAM); ROM; magnetic oroptical storage medium; flash memory devices; electrical, optical,acoustical or other form of propagated signals (e.g., carrier waves,infrared signals, digital signals); etc.

In the foregoing specification, a detailed description has been givenwith reference to specific exemplary embodiments. It will, however, beevident that various modifications and changes may be made theretowithout departing from the broader spirit and scope of the invention asset forth in the appended claims. The specification and drawings are,accordingly, to be regarded in an illustrative sense rather than arestrictive sense. Furthermore, the foregoing use of embodiment andother exemplarily language does not necessarily refer to the sameembodiment or the same example, but may refer to different and distinctembodiments, as well as potentially the same embodiment.

1. An apparatus comprising: a shared memory including a plurality ofshared lines; an execution module to execute a plurality of operationsgrouped into a transaction, wherein one of the plurality of operationsincludes an access to a shared line of the plurality of shared lines; alock module, when invoked, to check the state of a meta-data locationassociated with the shared line; and an acceleration module to invokethe lock module, if the access to the shared line is the first access tothe shared line during execution of the transaction, and allow access tothe cache line, without invoking the lock module, if the access to theshared line is not the first access to the shared line during executionof the transaction.
 2. The apparatus of claim 1, further comprising aneviction tracking module to abort the transaction, if the evictiontracking module is operating in an aggressive mode and tracks aneviction notification during execution of the transaction, and tovalidate the transaction, if the eviction tracking module is operatingin a cautious mode and tracks an eviction notification during executionof the transaction.
 3. The apparatus of claim 2, wherein the evictiontracking module comprises: logic to generate the eviction notification,wherein the eviction notification represents an access selected from agroup consisting of an eviction of the shared memory line by a remoteresource, an eviction of the shared memory line due to capacityconstraints, an eviction of the shared memory line due to a snoop to theshared memory line by a remote resource, and an eviction of the sharedmemory line due to an access to the shared memory line invalidating acopy of the shared memory line stored in a transaction memory setassociated with the transaction, and a handler to abort the transactionbased on the eviction notification, if the eviction tracking module isoperating in the aggressive mode, and to validate the transaction basedon the eviction notification, if the eviction tracking module isoperating in the cautious mode.
 4. The apparatus of claim 1, wherein themeta-data location is a lock.
 5. The apparatus of claim 4, wherein thelock is in an array of locks, and wherein the lock indexed in the arrayof locks through a hash value of an address referencing the sharedmemory line.
 6. The apparatus of claim 5, wherein the state of the lockis a first owned state, which is represented by a first value stored inthe lock, if the lock is owned, and the state of the lock is a secondun-owned state, which is represented by a second version value stored inthe lock, if the lock is not owned.
 7. The apparatus of claim 6, whereinthe first value is an even integer, and wherein the second version valueis an odd integer.
 8. The apparatus of claim 6, wherein the lock module,by default, is to operate in an aggressive mode, and wherein the lockmodule is to operate in a cautious mode, if the transaction is aborted afirst number of times.
 9. The apparatus of claim 8, wherein the lockmodule, when operating in the aggressive mode, is to acquire the lockand not to store the second version value in a local transaction memoryset, after checking the state of the lock, if the lock is in the secondun-owned state.
 10. The apparatus of claim 8, wherein the lock module,when operating in the cautious mode, is to acquire the lock and storethe second version value in a local transaction memory set, afterchecking the state of the lock, if the lock is the second un-ownedstate, and validate the stored second version value, upon committing thetransaction.
 11. The apparatus of claim 1, wherein the accelerationmodule is also to determine if the access to the shared line is thefirst access to the shared line during execution of the transaction, andwherein determining if the access to the shared line is the first accessto the shared line during execution of the transaction comprises:checking a transaction bit associated with the shared memory line,wherein the transaction bit represents a first value, if the sharedmemory line has not been accessed during execution of the transaction,and the transaction bit represents a second value, if the shared memoryline has been previously accessed during execution of the transaction.12. The apparatus of claim 11, wherein allowing access to the cacheline, without invoking the lock module, if the access to the shared lineis not the first access to the shared line during execution of thetransaction comprises: providing the cache line to the execution module,without invoking the lock module to check the state of the lock, if thetransaction bit represents the second value.
 13. The apparatus of claim11, wherein the transaction bit is associated with the shared memoryline through being a bit within the shared memory line.
 14. Theapparatus of claim 1, wherein the shared memory is a cache memory sharedbetween at least two resources present on a microprocessor, and whereinthe execution module includes a fixed point unit to perform fixed datapoint operations and a floating point unit to perform floating datapoint operations.
 15. An apparatus comprising: a processor including acache memory including a plurality of cache lines; execution resourcesto execute a transaction, the transaction including a first instructionto access a first cache line of the plurality of cache lines, which isassociated with a first transaction field and a first lock; anacceleration module to check a state of the first lock before the accessto the first cache line, if the first transaction field represents afirst value, not check the state of the first lock before the access tothe first line, if the first transaction field represents a secondvalue, and set the first transaction field to represent the second valueupon the access to the first cache line, if the access is the firstaccess to the first cache line during execution of the transaction. 16.The apparatus of claim 15, wherein each cache line of the plurality ofcache lines is capable to store a plurality of elements, and wherein thefirst transaction field includes a plurality of transaction bits, eachof the transaction bits corresponding to one of the plurality ofelements in the first cache line.
 17. The apparatus of claim 16, whereineach of the plurality of elements is individually selected from a groupconsisting of an instruction, an operand, and a grouping of logicalvalues, and wherein the first cache line is associated with a first lockthrough mapping at least one element of the plurality of elements in thefirst cache line to the first lock.
 18. The apparatus of claim 15,wherein the first cache line is associated with the first lock through ahash table, the first lock being indexed in the hash table with aportion of an address referencing the first cache line.
 19. Theapparatus of claim 15, wherein the first transaction field includes atransaction bit, and wherein the first value is a high logical value andthe second value is a low-logical value.
 20. The apparatus of claim 15,wherein the first instruction is a load instruction, and wherein theacceleration module, when operating in a first mode, does not store alocal copy of a version number stored in the first lock, upon acquiringthe first lock, and when operating in a second mode, stores a local copyof a version number stored in the first lock, upon acquiring the firstlock.
 21. The apparatus of claim 20, wherein the processor furtherincludes a commitment module, the commitment module to validate thelocal copy of the version number to determine if the first line isevicted before the transaction is committed, if the acceleration moduleis operating in the second mode; and reset the first transaction fieldto the first value, upon committing the transaction.
 22. The apparatusof claim 15, wherein the execution resources are also to execute ahandler routine to abort the transaction, if the first line is evictedbefore the transaction is committed, and wherein the handler routine iscapable of rolling back nested transactions at a granularity of onetransaction.
 23. The apparatus of claim 15, wherein the processor isselected from a group consisting of a host processor, a microprocessor,a processing core, a logical processor, and an embedded processor, amulti-threaded processor, and a multi-core processor.
 24. A systemcomprising: a multiple-resource microprocessor including a cache memoryincluding a plurality of cache lines; an execution unit to execute atransaction, the transaction including a plurality of accesses to acache line of the plurality of cache lines; an acceleration module tore-vector execution, by the execution unit, to a barrier associated withthe cache line, upon a first access of the plurality of accesses to thecache line during a pendancy of the transaction, and allow a subsequentaccess of the plurality of accesses to the cache line during thependancy of the transaction, without re-vectoring execution to thebarrier associated with the cache line; a system memory coupled to themulti-resource microprocessor to store elements to be loaded into theplurality of cache lines in the cache memory.
 25. The system of claim24, wherein each resource of the multiple resources in themultiple-resource processor is selected from a group consisting of aprocessor core, a logical processor, a processor thread, and a physicalprocessor, and wherein the system memory is a memory device selectedfrom a group consisting of a static random access memory (SRAM), dynamicrandom access memory (DRAM), a double data rate random access memory(DDR RAM), and a buffered random access memory (RAM).
 26. The system ofclaim 24, wherein re-vectoring execution by the execution unit to abarrier associated with the cache line includes setting a carry flag toa first value, if the cache line has not been accessed a first timeduring pendancy of the transaction; inspecting the carry flag; andcalling a handler, after inspecting the carry flag, to re-vectorexecution by the execution unit to the barrier, if the carry flagrepresents the first value.
 27. The system of claim 24, whereinre-vectoring execution by the execution unit to a barrier associatedwith the cache line includes generating an interrupt, if the cache linehas not been accessed a first time during pendancy of the transaction;and handling the interrupt with a handler, the handler to re-vectorexecution by the execution unit to the barrier.
 28. The system of claim27, wherein determining if the cache line has been accessed a first timeduring pendancy of the transaction includes: checking a transaction bitassociated with the cache line; determining the cache line has beenaccessed a first time during pendancy of the transaction, if thetransaction bit represents a first logical value; and determining thecache line has not been accessed a first time during pendancy of thetransaction, if the transaction bit represents a second logical value.29. The system of claim 28, wherein the transaction bit is changed fromthe first logical value to the second logical value, upon the firstaccess to the cache line during pendancy of the transaction, and whereinthe transaction bit is reset to the first logical value, upon commitmentof the transaction.
 30. The system of claim 27, wherein execution by theexecution unit is also re-vectored to a barrier associated with thecache line, if a cache-miss occurs.
 31. The system of claim 27, whereinthe barrier includes a lock within an array of locks, the lock beingindexed in the array of locks by at least a portion of an addressreferencing the cache line.
 32. The system of claim 31, wherein the lockrepresents an even number, if the lock is owned by a resource in themulti-resource processor, and an odd version number to represent aversion of the cache line, if the lock is not owned by a resource in themulti-resource processor.
 33. A method comprising: executing a memoryaccess instruction within a transaction, wherein the memory accessinstruction references a location in a shared memory; accessing thelocation, without determining a state of a lock mapped to the locationin the shared memory, if a transaction bit associated with the locationrepresents a first value; if the transaction bit associated with thelocation represents a second value, determining the state of the lock,acquiring the lock and accessing the location, if the state of the lockrepresents an un-owned state, setting the transaction bit to the firstvalue; and generating an eviction notification, if the location isevicted and the transaction bit associated with the location representsthe first value.
 34. The method of claim 33, wherein the evictionnotification is generated based on an eviction event selected from agroup consisting of an eviction interrupt, an inspection of a evictioncounter, and an eviction due to capacity constraints.
 35. The method ofclaim 33, wherein the state of the lock represents the un-owned state,if, the lock represents an odd integer version value, and the state ofthe lock represents an owned state, if the lock represents an eveninteger value.
 36. The method of claim 35, wherein acquiring the lockcomprises writing the even integer value to the lock.
 37. The method ofclaim 36, further comprising: if operating in an aggressive mode: notstoring the odd integer version value before writing the even integervalue to the lock, committing the transaction, if an evictionnotification is not generated during a pendancy of the transaction, andaborting the transaction, if an eviction notification is generatedduring the pendancy of the transaction.
 38. The method of claim 37,further comprising: if operating in a cautious mode: storing the oddinteger version value before writing the even integer value to the lock,committing the transaction, if an eviction notification is not generatedduring a pendancy of the transaction, and validating the odd integerversion value before committing the transaction, if an evictionnotification is generated during the pendancy of the transaction. 39.The method of claim 38, wherein the aggressive mode is a default mode,and wherein cautious mode operation occurs, if the transaction aborts apredetermined number of times.
 40. The method of claim 33, whereinaccessing the location is selected from a group consisting of a readfrom the location, a write to the location, a load from the location,and a store to the location.
 41. A method comprising: determining if anaccess referencing a location in a shared memory is a first access tothe location during a pendancy of a transaction; revectoring executionto a barrier associated with a location in the shared memory, if theaccess is the first access to the location during the pendancy of thetransaction; accessing the location in the shared memory, withoutrevectoring execution to the barrier associated with the location in theshared memory, if the access is an access subsequent to the first accessduring the pendancy of the transaction.
 42. The method of claim 41,wherein the location is to store a plurality of elements, and whereineach of the plurality of elements is individually selected from a groupconsisting of an instruction, an operand, a data operand, and a groupingof logical values.
 43. The method of claim 42, wherein each element ofthe plurality of elements to be stored in the location is associatedwith a transaction bit, and wherein determining if the accessreferencing the location in the shared memory is a first access to thelocation comprises: determining at least one element of the plurality ofelements referenced by the access referencing the location; checking thetransaction bit associated with the at least one element of theplurality of elements; determining the access referencing the locationin the shared memory is the first access to the location, if thetransaction bit associated with the at least one element represents afirst value.
 44. The method of claim 41, wherein determining if theaccess referencing the location in the shared memory is a first accessto the location comprises: checking a transaction bit associated withthe location; determining the access referencing the location in theshared memory is a first access, if the transaction bit represents afirst value.
 45. The method of claim 44, wherein re-vectoring executionto a barrier associated with the location in the shared memory, if theaccess is the first access to the location during the pendancy of thetransaction comprises: generating a transaction miss interrupt, if thetransaction bit represents the first value; executing a handler tohandle the transaction miss interrupt, wherein handling the transactionmiss interrupt includes checking the barrier associated with thelocation before accessing the location:
 46. The method of claim 45,wherein the barrier associated with the location includes a lockassociated with the location in a hash table, wherein the lock isindexed in the hash table with a portion of an address referencing thelocation in the shared memory:
 47. The method of claim 46, furthercomprising: accessing the location, after re-vectoring execution to thebarrier and acquiring the lock associated with the location; and settingthe transaction bit associated with the location to represent a secondvalue, after accessing the location:
 48. The method of claim 47, furthercomprising: determining if the access is an access subsequent to thefirst access during pendancy of the transaction, wherein determining ifthe access is an access subsequent to the first access comprises:checking the transaction bit associated with the location, anddetermining the access is an access subsequent to the first access, ifthe transaction bit represents the second value.